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Abstract. To protect sensitive information in a cross tabulated table, it is a common practice 
to suppress some of the cells in the table. An analytic invariant is a power series in terms of the 
suppressed cells that has a unique feasible value and a convergence radius equal to +00. Intuitively, 
the information contained in an invariant is not protected even though the values of the suppressed 
cells are not disclosed. This paper gives an optimal linear-time algorithm for testing whether there 
exist nontrivial analytic invariants in terms of the suppressed cells in a given set of suppressed 
cells. This paper also presents NP-completeness results and an almost linear-time algorithm for the 
problem of suppressing the minimum number of cells in addition to the sensitive ones so that the 
resulting table does not leak analytic invariant information about a given set of suppressed cells. 
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1. Introduction. Cross tabulated tables are used in a wide variety of documents 
to organize and exhibit information, often with the values of some cells suppressed in 
order to conceal sensitive information. Concerned with the effectiveness of the practice 
of cell suppression [jl2| , statisticians have raised two fundamental issues and developed 
computational heuristics to various related problems [|. 1. 1. §, [fi], 0, |H §f, © H • 



The detection issue is whether an adversary can deduce significant information about 
the suppressed cells from the published data of a table. The protection issue is how a 
table maker can suppress a small number of cells in addition to the sensitive ones so 
that the resulting table does not leak significant information. 

This paper investigates the complexity of how to protect a broad class of infor- 
mation contained in a two-dimensional table that publishes (1) the values of all cells 
except a set of sensitive ones, which are suppressed, and (2) an upper bound and a 
lower bound for each cell, and (3) all row sums and column sums of the complete set 
of cells. The cells may have real or integer values. They may have different bounds, 
and the bounds may be finite or infinite. The upper bound of a cell should be strictly 
greater than its lower bound; otherwise, the value of that cell is immediately known 
even if that cell is suppressed. The cells that are not suppressed also have upper 
and lower bounds. These bounds are necessary because some of the unsuppressed 
cells may later be suppressed to protect the information in the sensitive cells. (See 



Figures 1.1 and 1.2 for an example of a complete table and its published version.) 

An unbounded feasible assignment to a table is an assignment of values to the 
suppressed cells such that each row or column adds up to its published sum. An 
bounded feasible assignment is an unbounded one that also obeys the bounds of the 
suppressed cells. An analytic function of a table is a power series of the suppressed 
cells, each regarded as a variable, such that the convergence radius is 00 |l], ||, [2l], 
p2| , p6| , |27j. An analytic invariant is an analytic function that has a unique value 
at all the bounded feasible assignments. If an analytic invariant is formed by a 
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linear combination of the suppressed cells, then it is called a linear invariant [|l7[ fl9| . 
Similarly, a suppressed cell is called an invariant cell fli] , [l5| if it is an invariant by 
itself. For instance, in the published table in Figure 1.2, let X p>q be the cell at row 
p and column q. Xq^ is an invariant because it is the only suppressed cell in row 
6. X2.C and X^^ c are invariant cells because their values are between and 9.5, their 
sum is 19, and both cells are forced to have the same unique value 9.5. Consequently, 
(X3 tC -X2,c + 0.5-X-2, c — 95) 2 -Xi i f ) + sin(X2, c -^2 : a — 9. 5-^2, a) is also an invariant. 

Intuitively, the information contained in an analytic invariant is unprotected be- 
cause its value can be uniquely deduced from the published data. In this paper, a set of 
suppressed cells is totally protected if there exists no analytic invariant in terms of the 
suppressed cells in the given set, except the trivial invariant that contains no nonzero 
terms. As the analytic power series form a very broad family of mathematical func- 
tions, total protection conceals from the adversary a very large class of information. 
This paper gives a very simple algorithm for testing whether a given set of suppress 
cells is totally protected. When a graph representation, called the suppressed graph, 
of a table is given as input, this algorithm runs in optimal 0(m + n) time, where 
m is the number of suppressed cells and n is the total number of rows and columns. 
This paper also considers the problem of computing and suppressing the minimum 
number of additional cells so that a given set of original suppressed cells becomes 
totally protected. This problem is shown to be NP-complete. For a large class of 
tables, this optimal suppression problem can be solved in 0((m + n)-a(n,m + n)) 
time, where a is an Ackcrman's inverse function and its value is practically a small 
constant |2|, [g, [16| . Moreover, for this class of tables, every optimal set of cells for 
additional suppression forms a spanning forest of some sort. As a consequence, at 
most n — 1 additional cells need to be suppressed to achieve the total protection of a 
given set of original suppressed cells. As the size of a table may grow quadratically 
in n, the suppression of n — 1 additional cells is a negligible price to pay for total 
protection for a reasonably large table. 

Previously, four other levels of data security have been considered that protect 
information contained, respectively, in individual suppressed cells [fli] , [l5| , in a row 
or column as a whole, in a set of k rows or k columns as a whole, and in a table 
as a whole These four levels of data security and total protection differ in two 
major aspects. First, these four levels of data security primarily protect information 
expressible as linear invariants, whereas total protection protects the much broader 
class of analytic invariant information. Second, these four levels of data security em- 
phasize protecting regular regions of a table, whereas total protection protects any 
given set of suppressed cells and is more flexible. These four levels of data security and 
total protection share some interesting similarities. As total protection corresponds to 
spanning forests in suppressed graphs, these four levels of data security are equivalent 
to some forms of 2-edge connectivity fll4| , |l5| , 2-vertex connectivity, k- vertex connec- 
tivity and graph completeness Jig] . In this paper, the NP-completeness results and 
efficient algorithms for total protection rely heavily on its graph characterizations. 
Similarly, the equivalence characterizations of these four levels of data security have 
been key in obtaining efficient algorithms (IJ, |l5|, [l^] and NP-completeness proofs Jl^] 
for various detection and protection problems. 

Section || discusses basic concepts. Section || formally defines the notion of to- 
tal protection and gives a linear-time algorithm to test for this notion. Sections || 
and U give NP-completeness results and efficient algorithms for optimal suppression 
problems of total protection. Section @ concludes this paper with discussions. 
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2. Basics of two-dimensional tables. This section discusses basic relation- 
ships between tables and graphs. 

A mixed graph is one that may contain both undirected and directed edges. A 
traversable cycle or path in a mixed graph is one that can be traversed along the 
directions of its edges. A direction-blind cycle or path is one that can be traversed if 
the directions of its edges are disregarded. The word direction-blind is often omitted 
for brevity. A mixed graph is connected (respectively, strongly connected) if each pair 
of vertices are contained in a direction-blind path (respectively, traversable cycle) . A 
connected component (respectively, strongly connected component) of a mixed graph 
is a maximal subgraph that is connected (respectively, strongly connected). A set of 
edges in a mixed graph is an edge cut if its removal disconnects one or more connected 
components of that graph. An edge cut is a minimal one if it has no proper subset 
that is also an edge cut. 

From this point onwards, let T be a table, and let TL' = (A, B, E') and TL = 
(A, B, E) be the bipartite mixed graphs constructed below. TL' and TL are called the 
total graph and the suppressed graph of T, respectively ||l5f . For each row (respectively, 
column) of T, there is a unique vertex in A (respectively, B). This vertex is called 
a row (respectively, column) vertex. For each cell Xy at row i and column j in 7~, 
there is a unique edge e in E between the vertices of row i and column j. If the 
value of Ai.j is strictly between its bounds, then e is undirected. Otherwise, if the 
value is equal to the lower (respectively, upper) bound, then e is directed towards to 
its column (respectively, row) endpoint. Note that TL' is a complete bipartite mixed 
graph, i.e., there is exactly one edge between each pair of vertices from the two vertex 
sets of the graph. The graph TL is the subgraph of TL' whose edge set consists of only 
those corresponding to the suppressed cells of T. Figure 2.1 illustrates a table and 
its suppressed graph. For convenience, a row or column of T will be regarded as a 
vertex in TL and a cell as an edge, and vice versa. 

Theorem 2.1 (p5[|). A suppressed cell ofT is an invariant cell if and only if it 
is not in an edge-simple traversable cycle ofTi. 

The effective area of an analytic function F of T, denoted by EA(F), is the set of 
variables in the nonzero terms of F. The function F is called nonzero if EA(F) ^ 0. 
Note that because the convergence radius of F is oo, EA(F) is independent of the 
point at which F is expanded into a power series. 

Theorem 2.2 (@). For every minimal edge cut Y of a strongly connected 
component ofH, T has a linear invariant F with EA(F) = Y. 

The bounded kernel (respectively, unbounded kernel) of T, denoted by BK{T) 
(respectively, UK(T)), is the real vector space consisting of all linear combinations 
of x — y, where x and y are arbitrary bounded (respectively, unbounded) feasible 
assignments of T. 

Because TL is bipartite, every cycle of TL is of even length. Thus, the edges of 
an edge-simple direction-blind cycle of TL can be alternately labeled with +1 and —1. 
Such a labeling is called a direction-blind labeling. A direction- blindly labeled cycle 
is regarded as an assignment to the suppressed cells of T. If the corresponding edge 
of a suppressed cell is in the given cycle, then the value assigned to that cell is the 
label of that edge; otherwise, the value is 0. Note that this assignment needs not be 
an unbounded feasible assignment of T . 

Theorem 2.3 ((jU). 

1. UK(T) = BK{T) if every connected component ofTL is strongly connected. 

2. Every direction-blindly labeled cycle of TL is a vector in UK(T). 
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3. Total protection. A set Q of suppressed cells of T is totally protected in T 
if there is no nonzero analytic invariant F of T with EA(F) C Q. The goal of total 
protection can be better understood by considering Q as the set of suppressed cells 
that contain sensitive data. The total protection of Q means that no precise analytic 
information about these data, not even their row and column sums, can be deduced 
from the published data of T. As analytic power series form a very large class of 
functions in mathematical sciences, this notion of protection requires a large class of 
information about Q to be concealed from the adversary. 

The next lemma and theorem characterize the notion of total protection in graph 
concepts. 

Lemma 3.1. If F is a nonzero analytic invariant of T such that the edges in 
EA(F) are contained in the strongly connected components of TL, then for some 
strongly connected component D ofTL, EA(F) R D is an edge cut of D. 

Remark. The converse of this lemma is not true; for a counter example, consider 



the linear combination X\ tC1 + 2-Ai^ for the table in Figure |2.1| . Also, if F is a 
nonzero linear invariant, then for every strongly connected component D of TL, the 
set D n EA(F) is either empty or is an edge cut of D JlTj. 

Proof. Let T s be the table constructed from T by also publishing the suppressed 



cells that are not in the strongly connected components of TL. By Theorem 2.1, F 



remains a nonzero analytic function of T s . Also, the connected components of the 
suppressed graph TL S of T s are the strongly connected components of TL. Thus, to 
prove the lemma, it suffices to prove it for T s , TL S , and F. 

Let xq be a fixed bounded feasible assignment of T s . Let K = {x — xo\x is a 
bounded feasible assignment of T s }. Since F is an analytic invariant of T~ s , the function 
G(x) = F(x) — F(xo) is an analytic invariant of T s with EA(G) = EA(F) and its value 
is zero over xo+K. Because K contains a nonempty open subset of BK(T S ), G is zero 
over xq + BK(T S ). By Theorem |2.3| (1) and the strong connectivity of the connected 
components of TC S , BK{T~ S ) = UK(T S ) and G is zero over xo + UK(T S ). Thus, it 
suffices to show that if D — EA(F) is connected for all connected components D oiTL s , 
then G(xo+zo) ^ for some zo € UK(T S ). To construct z , let EA(G) — {ei, . . . , e^}. 
Let Di be the connected component of TL S that contains e,. By the connectivity of 
Di — EA(F), there is a vertex-simple path Pi in Di — EA(F) between the endpoints 
of ej. Let Ci be the vertex-simple cycle formed by and Pi. Next, direction-blindly 
label Ci with a labeled +1. Since G is a nonzero power series, G(xq +yo) ^ f° r some 
vector y . Note that y is not necessarily in UK(T S ). So, let zo = J2i=i h-i-Ci, where 
hi is the component of yo at variable e^. Then, by Theorem |2.3|(2), zq <E UK(T s ). 



Because Pi is in TL S — EA{F), ei appears only in the term Ci in X)i=i hi'Gi- Thus zq 
and yo have the same component values at the variables in EA(G). Since the variables 
not in EA{G) do not appear in any expansion of G, G(xq + zq) = G(xo + yo) =/= 0, 
proving the lemma. □ 

Theorem 3.2. A set Q of suppressed cells is totally protected in T if and only 
if the two statements below are both true: 

1. The edges in Q are contained in the strongly connected components ofTL. 

2. For each strongly connected component D ofTL, the graph D — Q is connected. 
Proof. It is equivalent to show that Q is not totally protected if and only if 

Q contains some edges not in the strongly connected components of TL or for some 
strongly connected component D of TL, the graph D — Q is not connected. The 



direction follows from Lemma 3.1. As for the ^= direction, if Q contains some edges 



not in the strongly connected components of TL, then by Theorem 2.1, Q contains 
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some invariant cells of T and thus cannot be totally protected. If for some strongly 
connected component D of TL, the graph D — Q is not connected, then some subset 



Y of Q is a minimal edge cut of D. By Theorem 2.2, T has a linear invariant F with 
EA(F) = Y and thus Q is not totally protected. □ 

This paper investigates the following two problems concerning how to achieve 
total protection. 

Problem 1 (Protection Test). 

• Input: The suppressed graph TL and a set Q of suppressed cells of a table T. 

• Output: Is Q totally protected in T? 

Theorem 3.3. Problem [J can be solved in linear time in the size ofTL. 

Proof. This problem can be solved within the desired time bound by means of 



Theorem 3.2 and linear-time algorithms for computing connected components and 
strongly connected components [|[ ^[ ||, |l6) . □ 
Problem 2 (Optimal Suppression). 

• Input: A table T, a subset Q of E, and an integer p > 0, where E is the set 
of all suppressed cells in T. 

• Output: Is there a set P consisting of at most p published cells of T such 
that Q is totally protected in the table T formed by T with the cells in P 
also suppressed? 

This problem is clearly in NP. Section |] shows that this problem with Q = E is 
NP-complete. In contrast, Section |5| proves that if the total graph of T is undirected, 
then this problem with general Q can be solved in almost linear time. 

4. NP-completeness of optimal suppression. Throughout this section, the 
total graph of T may or may not be undirected. 

Theorem 4.1. Problem || with Q = E is NP-complete. 

To prove this theorem, the idea is to first transform Problem ||with Q = E to the 
following graph problem and then prove the NP-completeness of the graph problem. 
Problem 3. 

• Input: A complete bipartite mixed graph TL' = (A,B,E'), a subgraph TL = 
(A, B, E), and an integer p > 0. 

• Output: Does any set P of at most p edges in E' — E hold the following two 
properties? 

Property Nl: Every connected component of (A, B, E U P) is strongly con- 
nected. 

Property N2: The vertices of each connected component of TL are connected 
in (A,B,P), i.e., contained in a connected component in (A,B,P). 
Lemma 4.2. Problem^ with Q = E and Problem |^ can be reduced to each other 
in linear time. 

Proof. Given an instance T and p of Problem |^ with Q — E, the desired instance of 
Problem H is the total graph Ti' = (A, B, E') and the suppressed graph TL = (A, B, E) 
of T, and p itself. This transformation can easily be computed in linear time. There 
are two directions to show that it reduces Problem || to Problem |[ Assume that P 
is a desired set for Problem^. By Property Nl, Statement 1 in Theorem 3.2 is true. 



Also, every strongly connected component of (A, B, E U P) is a union of edge-disjoint 
connected c omp onents in TL and (A,B,P). Therefore, by Property N2, Statement 2 



of Theorem |3 . 2| holds . As a result, P itself is a desired set for Problem |2|. On the other 
hand, assume that P is a desired set for Problem Let P' be the set of all edges in 
P that are also in the strongly connected components of (A, B, EUP). By Statement 



1 of Theorem 3.2 and the total protection of E in T, the connected components of 
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(A, B, EUP') are the strongly connected components of (^4, B, EUP). Thus, P' holds 
Property Nl. Next, because a connected component of TL is inclu ded in a strongly 
connected component of (^4, B,EU P'), by Statement 2 of Theorem |3.2| , P' also holds 
Property N2 and thus is a desired set for Problem ||. 

Given an instance TL', TL, and p of Problem [|, the desired instance of Problem || 
with Q — E is p itself and the table defined as follows. For each vertex in A (re- 
spectively, B), there is a row (respectively, column). The upper and lower bounds for 
each cell are 2 and 0. For each edge e in E', its corresponding cell is at the row and 
column corresponding to its endpoints. The value of that cell is 1 (respectively, and 
2) if e is undirected (respectively, directed from A to B, or directed from B to A). 
For each edge e in TL, its corresponding cell is suppressed. Note that the total and 
suppressed graphs of this table are TL' and TL themselves. Thus, the remaining proof 
details for this reduction are essentially the same as for the other reduction. □ 

Both Problem |^ with Q — E and Problem || are clearly in NP. To prove their 
completeness in NP, by Lemma 4.2 it suffices to reduce the following NP-complete 
problem to Problem |3|. 

Problem 4 (Hitting Set Q). 

• Input: A finite set S, a nonempty family W of subsets of S, and an integer 
h>0. 

• Output: Is there a subset S' of S such that \S'\ < h and S' contains at least 
one element in each set in Wl 

Given an instance S = {si, . . . , s q }, W — {Si, . . . , S r }, h of Problem |], an in- 
stance TL' = (A, B, E'),TL = (A, B, E),p of Problem || is constructed as follows: 

• Rule 1: Let A = {ao, ai, . . . , a q }. The vertices a\, . . . , a q correspond to 
si, . . . , s q , but ao corresponds to no Sj. 

• Rule 2: Let B = {b , b\, . . . , b r }. The vertices bi,...,b r correspond to 
Si, . . . , S r of S, but bo corresponds to no Sj. 

• Rule 3: Let E' be the union of the following sets of edges: 

1. {b Q -> a }. 

2. {ao — > bj | V j with 1 < j < r}. 

3. {ai — > bo | V i with 1 < i < q}. 

4. {bj — > a, | V Sj and Sj with Si € S^}. 

5. {ai — > 6j | V Sj and Sj with Sj ^ Sj}. 

• Rule 4: Let £ = {a Q — > 6i, . . . , ao — > 

• Rule 5: Let p = /i + r + 1. 

The above construction can easily be computed in polynomial time. The next 
two lemmas show that it is indeed a desired reduction. 

Lemma 4.3. If some set S' C S with \S'\ < h contains at least one element 
in each Sj , then there is a set P C E' — E consisting of at most p edges that holds 
Properties Nl and N2. 

Proof. For each Sj, let Si j be an element in S' (~l Sj\ by the assumption of 
this lemma, these elements exist. Next, let Pi = {b\ — » ai 1 ,... 1 b r — > a^} and 
f*2 = {^ij — * b ,...,a ir — » 6o}; by Rule 3, these two sets exist. Now, let P = 
P1UP2LK&0 - * ao}- Note that P C E'—E. Since Pi consists of r edges and Pi consists 
of at most |S'| edges, P has at most p edges. P holds Property Nl because EUP 
consists of the edges in the traversable cycles 60 — > ao , ao — ► 6j , bj — * a^. , a^ — » 60 . 
Property N2 of P follows from the fact that P connects {ao, 61, . . . , b r }, which forms 
the only connected component of TL with more than one vertex. □ 

Lemma 4.4. If some setP C E' — E consisting of at mostp edges holds Properties 
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Nl and N2, then there exists a set S' C S with \S'\ < h that contains at least one 
element in each Sj . 

Proof. By Property Nl, P must contain some edge bj —* a^. for each j with 
1 < j < r. By Rule 3(4), s is G Sj. Now let S' = {s ii: . . . , s ir }. To calculate the 
size of S', note that by Property Nl, P must also contain bo — > oq and at least 
one edge leaving ai j for each j. Thus |P| > \S'\ + r + 1. Then |S"| < ft, because 
|-P| < P = r + h + 1. □ 



The above lemma completes the proof of Theorem 4.1. 



5. Optimal suppression in almost linear time. Under the assumption that 
the total graph of T is undirected, this section considers the following optimization 
version of Problem |^. 

Problem 5 (Optimal Suppression). 

• Input: The suppressed graph Ti. = (A, B,E) of a table T and a subset Q of 
E. 

• Output: A set P consisting of the smallest number of published cells in T 
such that Q is totally protected in the table T formed by T with the cells in 
P also suppressed. 

For all positive integers n and m, let a denote the best known function such that 
m + n unions and finds of disjoint subsets of an n-element set can be performed in 
0((m + n)-a(n, m + n)) time fi ||, ||, [Hi- 

Theorem 5.1. Problem can be solved in 0((m + n)-a(n, m + n)) time, where 
m is the number of suppressed cells and n is the total number of rows and columns in 

r. 

To prove Theorem 5.1, Problem^ is first converted to the next problem. 



Problem 6. 

• Input: An undirected bipartite graph H = (A, B, E) and a subset Q of E. 

• Output: A forest P formed by the smallest number of undirected edges be- 
tween A and B but not in E such that the vertices of each connected com- 
ponent of (A, B, Q) are connected in (A, B, (E — Q) U P), i.e., contained in a 
connected component of (A, B, {E — Q) U P). 

Lemma 5.2. Pro6/ems[^ andQ can be reduced to each other in linear time. 



Proof. The proof uses arguments similar to those in the proof of Lemma 4.2. The 



strong connectivity properties in Problem |3j and Theorem 3.2 can be ignored because 
this section assumes that the total graph of T is undirected. The forest structure of 
P follows from its minimality. □ 

Note that because Q C E, the vertices of each connected component of (A, B, Q) 
are connected in (^4, B, (E — Q) U P) if and only if the vertices of each connected 
component of H are connected in (A,B, (E — Q) U P). Using this equivalence, the 
next stage of the proof of Theorem |5.l| further reduces Problem [j] to another graph 
problem with the steps below: 

Ml. Compute the connected components D\, ■ ■ ■ , D r of TL. 

M2. For each Di, compute a maximal forest Ki over the vertices of Di using only the 
edges in E — Q. 

M3. For each Di, extend Ki to a maximal forest Li over the vertices of Di using 
additional edges only from the complement graph D\ of Di. 

M4. Construct a graph Ji from 7i by contracting each tree in each Li into a single 
vertex. 

M5. For each Di, compute its contracted version Di in "ti. 
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M6. Divide the vertices of H into three sets, Va, Vb, Vab, where a vertex in Va 
(respectively, Vb) consists of a single vertex from A (respectively, B), and a 
vertex in Vab contains at least two vertices (thus with at least one from each of 
A and B). 

A set of undirected edges between vertices in Va, Vb, Vab is called semi-tripartite 
if every edge in that set is between two of the three sets or is between two vertices in 
Vab- Note that the set of edges in H is semi-tripartite. 

Problem 7. 

• Input: Three disjoint finite sets Va, Vb, Vab, and a partition D±, . . . ,D r of 
VaUVbUVab- 

• Output: A semi-tripartite set P consisting of the smallest number of edges 
such that no edge in P connects two vertices in the same Di and the vertices 
in each Di are connected in the graph formed by is P. 

Lemma 5.3. Problem |^ can be reduced to Problem in 0((m + n)-a(n, m + n)) 
time, where m is the number of edges and n is the number of vertices in Ti. 

Proof. The key idea is that an optimal P for Problem || can be obtained by 
connecting the vertices of each Di first with edges in E — Q, which can be used 
for free, next with edges in D\, and then with edges outside Di U . Let P' be 
a set of \P\ edges in the complement of H that becomes P after Step MQ. Then, 
P 1 U (L\ — K\) U • • ■ U (L r — K r ) is a desired output P for Problem |(| showing that 
Steps Ml-M|| can indeed reduce Problem || to Problem 0. Step M^| is the only step 
that requires more than linear time. It is important to avoid directly computing D\ 
at Step M^. Computing these complement graphs takes 0(|A|-|S|) time if some Di 
contains a constant fraction of the vertices in Ti. In such a case, if Ti is sparse, then 
the time spent on computing D\ alone is far greater than the desired complexity. 
Instead of this naive approach, Step M|| uses efficient techniques recently developed 
for complement graph problems ]2C| ] and takes the desired 0((m + n)-a(n,m + n)) 
time. □ 

The last stage of the proof of Theorem 5.1 is to give a linear-time algorithm for 
Problem [?|. A component Di is good if it has at least two vertices with at least one 
from Vab', it is bad if it has at least two vertices with none from Vab (and thus with 
at least one from each of Va and Vb). The goal is to use as few edges as possible to 
connect the vertices in each of these components. Let w g and Wb be the numbers of 
good and bad components, respectively. There are three cases based on the value of 

Wg. 

Case 1: w g = 0. If Wf, = 0, then let P = because no Di needs to be connected. 
If Wb > and \Vab\ > 0, then include in P an edge between each vertex in the bad 
components and an arbitrary vertex in Vab- If > and \Vab\ — 0, then there 
does not exist a desired P and the given instance of Problem ^ has no solution. 

Case 2: w g = 1. Let Dj be the unique good component. 

If u>b > 0, then find a bad component Dk, and three vertices u G Vab D Dj, 
v\ G Va H £>k, i>2 S Vb H D^. Next, include in P an edge between V2 and each vertex 
in (Dj fl (Va U Vab)) — {u}, an edge between v\ and each vertex in Dj n Vb, and an 
edge between u and each vertex in the bad components. 

If Wb = and Vab ~ Dj ^ 0, then include in P an edge between every vertex in 
Dj and an arbitrary vertex in Vab — P > j ■ 

If Wb = and Vab ~ Dj = 0, then there are sixteen subcases depending on 
whether V A n Dj =%V A - Dj = 0, V B n Dj = 0, V B - Dj = 0. If V A H Dj ± 0, 
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Va — Dj ^ 0, Vb fl Dj ^ 0, Vb — Dj ^ 0, then include in P an edge between each 
vertex in Va D -Dj and a vertex «2 G Vb — Dj , an edge between each vertex in Vb H -Dj 
and a vertex v\ E Va — Dj, and an edge between v\ and each vertex in Vab U {^2}- 
The other fifteen subcases are handled similarly. 

Case 3: w g > 2. Let d be the total number of vertices in the good and bad 
components. Let w' be the number of connected components in P that contain the 
vertices of at least one good or bad Di\ let d! be the number of vertices in these 
connected components of P that are not in any good or bad Di . By its minimality, P 
forms a forest and \P\ = a" + d — w'. The techniques for Cases 1 and 2 can be used to 
show that there exists an optimal P with d 1 = 0. Thus, to minimize \P\ is to maximize 
it)'. Because two bad components cannot be connected by edges between them alone, 
the strategy for maximizing w' is to pair a good component with a bad one, whenever 
possible, and include in P edges between them to connect their vertices into a tree. 
After this step, if there remain unconnected bad components but no unconnected good 
ones, then add to P an edge between each vertex in the remaining bad components 
and an arbitrary vertex in the intersection of Vab and a good component. On the 
other hand, if there remain good components but no bad ones, then pair up these good 
components similarly. After this step, if there remains a good component, then add 
to P an edge between each vertex in this last good component and an arbitrary vertex 
in the intersection of Vab and another good component. (As a result, if w g < Wb, 
then \P\=d- w g ; otherwise, \P\=d- P^"" ].) 

The above discussion yields a linear-time algorithm for Problem | in a straight- 
forward manner. This finishes the proof of Theorem pjl. 



6. Discussions. Lemma |5.2j has several significant implications. Since P is a 
forest, it has at most n — 1 edges. Thus, for a table with an undirected total graph, no 
more than n—1 additional cells need to be suppressed to achieve total protection. This 
is a small number compared to the size of the table, which may grow quadratically 
in n. Moreover, when Ti. is connected and E = Q, (A, B, P) is a spanning tree. In 
this case, many well-studied tree-related computational concepts and tools, such as 
minimum-cost spanning trees, can be applied to consider other optimal suppression 
problems for total protection. 

Acknowledgements. The author is deeply grateful to Dan Gusfield for his con- 
stant encouragement and help. The author wishes to thank an anonymous referee 
for very helpful and thorough comments. The referee has also pointed out that some 



very interesting materials related to Theorems 1.1 and 2.3 have been developed in the 
context of protecting sums of suppressed cells |23, |24|, 2f|. 
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Fig. 1.1. A Complete Table. 
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Note: Let X Ptq denote the cell at row p and column q. The lower and 
upper bounds for all suppressed cells except Xi, c and Xz, c are — oo and 
+oo. The lower and upper bounds for X^,c and X3, c are and 9.5. 
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In the above 3x3 table, the number in each cell is the value of that cell. A cell with a 
box is a suppressed cell. The lower and upper bounds of the suppressed cells are and 9. 
The graph below the table is the suppressed graph of the table. Vertex R p corresponds to 
row p, and vertex C q to column q. 



Fig. 2.1. A Table and Its Suppressed Graph. 



